Skip to content

fix(rivetkit): prevent sleep races during disconnect and db work#4644

Open
NathanFlurry wants to merge 1 commit intofix/isolate-engine-envoysfrom
fix/prevent-sleep-db-races
Open

fix(rivetkit): prevent sleep races during disconnect and db work#4644
NathanFlurry wants to merge 1 commit intofix/isolate-engine-envoysfrom
fix/prevent-sleep-db-races

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 13, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

Code Review

Overview

This PR fixes a race condition in the actor sleep lifecycle where an actor could enter sleep prematurely when a connection is being disconnected but the onDisconnect callback has not yet completed.

Two targeted changes:

  1. connection-manager.ts: Calls resetSleepTimer() immediately after pendingDisconnectCount increments, so the timer is re-evaluated with the pending disconnect work counted.
  2. mod.ts: Adds a safety guard inside the setTimeout callback. If canSleep() is no longer Yes when the timer fires, it re-arms the timer instead of proceeding to sleep.

Race Condition Being Fixed

Before this fix:

  1. Last connection removed from connections so canSleep() returns CanSleep.Yes
  2. resetSleepTimer() is called from an earlier call site and sets a sleep timer
  3. pendingDisconnectCount increments but the timer is already armed
  4. Timer fires and startSleep() is called while onDisconnect callback is still running

The fix ensures that after incrementing pendingDisconnectCount, the timer is immediately reset so the next canSleep() evaluation returns CanSleep.ActiveDisconnectCallbacks.

The guard in the setTimeout callback provides defense-in-depth for any remaining timing gaps.


Code Quality

Strengths:

  • Minimal diff (5 lines added) -- touches only what needs changing.
  • Fix is placed directly after the state mutation it guards against.
  • The double resetSleepTimer() call in connDisconnected (once after increment, once in finally after decrement) is intentional and correct.
  • The timer callback guard is consistent with how resetSleepTimer() already re-evaluates state.

Suggestions:

  1. Add a comment to the timer guard. The re-check inside the setTimeout callback is non-obvious. A brief inline comment explaining why canSleep() needs to be re-evaluated at fire time (state may have changed since the timer was armed) would help future readers.

  2. Fill out the PR description. The checklist and test/motivation fields are empty. Even a single sentence describing the race would be valuable for git history.

  3. Regression test (optional). The existing sleep tests do not cover this specific race. A test that closes a connection and asserts the actor does not enter sleep before onDisconnect completes would prevent future regressions, though the timing-dependent nature makes this hard to write reliably.


Verdict

The logic is correct and the fix is well-targeted. No functional concerns. The main suggestion is a short comment on the timer guard in mod.ts to document the intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant